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Effects of Multiple-Choice and Short-Answer Tests 
on Delayed Retention Learning 

W. J. Haynie, III 1 
Abstract 

This research investigated the value of short-answer in-class tests as 
learning aids. Undergraduate students («=187) in 9 technology education 
classes were given information booklets concerning “high-tech” materials with¬ 
out additional instruction. The control group was not tested initially. Students in 
the experimental groups were either given a multiple-choice or a short-an¬ 
swer in-class test when they returned the booklets. All groups were tested for 
delayed retention three weeks later. The delayed retention test included subtests 
of previously tested and new information. Both short answer and multiple- 
choice tests were more effective than no test in promoting delayed retention 
learning. No difference was found between short-answer and multiple-choice 
tests as learning aids on the subtest of information which had not been tested on 
the initial tests, however, multiple-choice tests were more effective in promo¬ 
tion of retention learning of the information actually contained in the immediate 
posttests. 

This study compared two types of teacher-made in-class tests (multiple- 
choice and short-answer) with a no test (control) condition to determine their 
relative effectiveness as aids to retention learning (that learning which is still 
retained weeks after the initial instruction and testing have occurred). The in¬ 
vestigation involved instruction via self-paced texts, initial testing of learning, 
and delayed testing 3 weeks later. The delayed tests, which included both pre¬ 
viously tested information and novel information that had not been previously 
tested, provided the experimental data for the study. 

Background 

The importance of testing in education makes it an important topic of con¬ 
tinuing research. As technology education evolves to emphasize more cognitive 
learning, the time devoted to testing and the effects of testing will become in¬ 
creasingly important. Most of the research on testing which has been reported 
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in recent years has concerned standardized tests (Bridgeford, Conklin, and 
Stiggins, 1986). Most of the evaluation done in schools, however, is done with 
teacher-made tests (Haynie, 1983, 1991, 1992; Herman & Dorr-Bremme, 1982; 
Mehrens, 1987; Mehrens & Lehmann, 1987; Newman & Stallings, 1982). The 
available findings on the quality of teacher-made tests cast some doubt on the 
ability of teachers to perform evaluation effectively (Burdin, 1982; Carter, 

1984, Fleming & Chambers, 1983; Gullickson & Ellwein, 1985; Haynie, 1992; 
Stiggins & Bridgeford, 1985; Wiggins, 1993). Despite these problems, Mehrens 
and Lehmann (1987) point out the importance of teacher-made tests in the 
classroom to evaluate attainment of specific instructional objectives. Evaluation 
by teacher-made tests in schools is an important part of the educational system 
and a crucial area for research (Haynie, 1990a, 1990b, 1991, 1992; Mehrens & 
Lehmann, 1987; Wiggins, 1993). 

One method of testing that has received little attention in the literature, 
however, which is popular in many educational settings, is the use of short-an¬ 
swer test items. Short-answer items are relatively easy to prepare (Haynie, 

1983) and may be scored more quickly than essay items. They are not as objec¬ 
tive as multiple-choice items because they sometimes do not give adequate in¬ 
formation to evoke the desired response even from students who know the sub¬ 
ject well. Despite this limitation, they may be useful on teacher-made tests be¬ 
cause there is good evidence to suggest that many teachers are not capable of 
authoring truly clear and effective multiple-choice items (Haynie, 1983, 1992). 
Since many teachers do use short-answer items, their usefulness in promotion 
of retention learning is worthy of research. 

Multiple-choice tests, take-home tests, and post-test reviews have all been 
shown to promote retention learning in previous studies (Haynie, 1990a, 1990b, 
1991, in press; Nungester & Duchastel, 1982). However, announcements of an 
upcoming test did not have a positive effect on retention learning without a test 
actually being given. It appears that increased studying due to anticipation of a 
test did not result in better retention — only the act of taking the test increased 
retention (Haynie, 1990a). No studies were found that investigated the effects of 
short-answer tests on retention learning which is the thrust of this research. 
Research on the effects of tests on retention learning within the context of tech¬ 
nology education classes and the value of the learning time they consume is 
limited to the studies cited above. 

Purpose and Definition of Terms 

The purpose of this study was to investigate the value of in-class multiple- 
choice and short-answer tests as aids to retention learning. “Retention learn¬ 
ing” as used here refers to learning which lasts beyond the initial testing and it 
is assessed with tests adminstered 2 or more weeks after the information has 
been taught and tested. A delay period of 3 weeks was used in this study. 
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“Initial testing” refers to the commonly employed evaluation by testing which 
occurs at the time of instruction or immediately thereafter. “Delayed retention 
tests” are research instruments which are administered 2 or more weeks after 
instruction and initial testing to measure retained knowledge. (Dwyer, 1968; 
Dwyer, 1973; Duchastel, 1981; Nungester & Duchastel, 1982; Haynie, 1990a, 
1990b, 1991, in press). The delayed retention test results were the only data 
analyzed in this investigation. 

In addition to studying the relative gains in retention learning acquired by 
students while they take a test, an effort was made here to determine whether 
information which has been studied but which does not actually appear on the 
immediate posttest will be retained in addition to that material which is on the 
test. This study also examined whether multiple-choice and short-answer tests 
differ in their effectiveness for promoting retention of both tested and untested 
material. The research questions posed and addressed by this study were: 

1. If delayed retention learning is the objective of instruction, does initial test¬ 

ing of the information aid retention learning? 

2. Does initial testing by short-answer tests aid retention learning as effectively 

as initial testing with multiple-choice tests? 

3. Will information which is not represented on initial testing be learned 

equally well by students tested via short-answer and multiple-choice tests? 

Methodology 

Population and Sample 

Undergraduate students in 9 intact technology education classes were pro¬ 
vided a booklet on new “high-tech” materials developed for space exploration. 
There were 187 students divided into three groups: (a) Multiple-choice test 
(Group A, n= 63), (b) Short-answer test (Group B, n=64), and (c) No test 
(Control, Group C, n= 60). All groups were from the Technology Education 
metals technology (TED 122) classes at North Carolina State University. 
Students were majors in Technology Education, Design, or in various engineer¬ 
ing curricula. Students majoring in Aerospace Engineering were deleted from 
the final sample because much of the material was novel to other students but 
had previously been studied by this group. All groups were team taught by the 
researcher and his graduate assistant. Treatments were randomly assigned to 
each section. Random assignment, deletion of students majoring in Aerospace 
Engineering, and absences on testing dates resulted in final group sizes which 
were slightly unequal. 

Design 

At the beginning of the course it was announced that students would be 
asked to participate in an experimental study and that they would be learning 
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subject matter reflected in the newly revised course outline while doing so. It 
was also pointed out, however, that formal tests had not been prepared on the 
added material, so this portion of the course would not be considered when de¬ 
termining course grades except to insure that they made a “good, honest at¬ 
tempt.” All other instructional units in the course were learned by students 
working in self-paced groups and taking subtests on the units as they studied 
them. The subtests were administered on three examination dates. The experi¬ 
mental study did not begin until after the first of the three examination dates to 
insure that students could see (and believe) that none of the eight regular 
subtests reflected the newly added subject matter. 

During the class period following the first examination date, the subtests 
which had been taken were reviewed and instructions for participation in the 
experimental study were given. All students were given copies of a 34 page 
study packet prepared by the researcher. The packet was titled “High 
Technology Materials” and it discussed composite materials, heat shielding 
materials, and non-traditional metals developed for the space exploration pro¬ 
gram and illustrated their uses in consumer products. The packet was in booklet 
form. It included the following resources typically found in textbooks: (a) A 
table of contents, (b) text (written by the researcher), (c) halftone photographs, 
(d) quotations from other sources, (e) diagrams and graphs, (f) numbered 
pages, (g) excerpts from other sources, and (h) an index with 119 entries cor¬ 
rectly keyed to the page numbers inside. Approximately one-third of the in¬ 
formation in the text booklet was actually reflected in the tests. The remainder 
of the material appeared to be equally relevant but served as a complex distract¬ 
ing field to prevent mere memorization of facts. Students were instructed to use 
the booklet as if it were a textbook and study as they normally would. 

Six intact class sections over a two year period were randomly assigned to 
Group A or Group B (three each). Both groups were told to study the packet 
and that they would be asked to take a test on the material in class one week 
later. Students were told that participation was voluntary and the tests would 
not affect their grades. Both groups were requested to return the packets on the 
test date also. Students were told that the purpose of the study was to examine 
the types of answers given on the tests to see if there was a difference in the 
way questions were approached. They were also again told that the results 
would not affect their course grades and that participation was voluntary. 

In order to obtain a control group, three randomly selected sections of stu¬ 
dents in the same course during the two semesters of the next year were given 
the same initial instructions. However, instead of announcing a test, the teacher 
told the students that the material was newly added to the course and no 
subtests had been prepared yet — so they were simply lucky and would be ex¬ 
pected to study the material as if they would be tested; however, they would not 
actually be tested. It is acknowledged that these students who participated in the 
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study in a different year than the other two groups could have been a confound¬ 
ing variable; however, they did come from three intact class sections with the 
same teachers as the other groups. It was felt that this was the only way to in¬ 
sure that students truly believed they would not be tested on the material. If 
they had been mingled with the other two groups, they would have readily seen 
that some sort of testing had to occur sometime or there would be no data for 
the experiment from their group. This would have also spoiled the effectiveness 
of the evasive statements to the other two (experimental) groups that “types of 
answers” on the tests were the data of interest. 

Three weeks later, all groups were asked to take an unannounced delayed 
retention test on the same material. They were told at this time that the Uue 
objective of the experimental study was to see which type of test (or no test) 
promoted delayed retention learning best, and that their earlier tests, if any, 
were not a part of the study data in any way. Students were told again that par¬ 
ticipation was voluntary. They were again asked to do their best and reminded 
that it did not affect their grades. 

The same room was used for all groups during instructional and testing 
periods and while directions were given. This helped to control extraneous 
variables due to environment. The same teacher provided all directions and nei¬ 
ther teacher administered any instruction in addition to the texts. Students were 
asked not to discuss the study or the text materials in any way. All class sec¬ 
tions met for 2 hours on a Monday-Wednesday-Friday schedule. Some students 
in each group were in 8:00 a.m. to 10:00 a.m. sections and the others were in 
10:00 a.m. to 12:00 noon sections, so neither time of day nor day of the week 
should act as confounding variables. Equal numbers of Fall Semester and 
Spring Semester students were assigned to each group. Normal precautions 
were taken to assure a good learning and testing environment. 

Instrumentation 

The initial tests were parallel forms of a single 20 item test. The short- 
answer version was identical to the multiple-choice form except that there were 
no alternatives from which to choose responses and brief prose answers were 
required. Multiple-choice items had five response alternatives. The same in¬ 
formation was reflected in both tests. It must be noted that, in general, short- 
answer tests tend to be used more often and appear to be more effective with 
lower level types of learning (Haynie, 1983), therefore, the information in this 
study was taught and tested primarily at the first three levels of the cognitive 
domain: (a) knowledge, (b) comprehension, and (c) application. 

The delayed retention test was a 30 item multiple-choice test. Twenty of 
the items in the retention test were alternate forms of the same items used on 
the initial in-class test. These served as a subtest of previously tested informa¬ 
tion. The remaining ten items were similar in nature and difficulty to the oth- 
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ers, but they had not appeared in any form on either of the initial tests. These 
were interspersed throughout the test and they served as a subtest of new infor¬ 
mation. The subtest on new information was used to determine if retention 
learning gains were made during the study period or during the process of ac¬ 
tually taking the tests — assuming that all of the information had been 
originally studied with relatively equal diligence, this information should be 
learned equally by all groups. If the type of test employed effected retention 
learning gains, then one of the tested groups would be expected to outperform 
the other one on the subtest of previously tested information. 

The delayed retention test was developed and used in a previous study 
(Haynie, 1990). It had been refined from an initial bank of 76 paired items and 
examined carefully for content validity. Cronbach's Coefficient Alpha proce¬ 
dure was used to establish a reliability of .74 for the delayed retention test. Item 
analysis detected no weak items in the delayed retention test. Thorndike and 
Hagen (1977) assert that tests with reliability approaching .70 are within the 
range of usefulness for research studies. 

Data Collection 

Students were given initial instructions concerning the learning booklets 
and directed when to return the booklets and take the tests. The in-class imme¬ 
diate posttests were administered on the same day that the booklets were col¬ 
lected. The unannounced delayed retention test was administered three weeks 
later. Data were collected on mark-sense forms from National Computer 
Systems, Inc. 

Data Analysis 

The data were analyzed with SAS (Statistical Analysis System) software 
from the SAS Institute, Inc. on a microcomputer. The answer forms were elec¬ 
tronically scanned and data stored on floppy disk. The General Linear Models 
(GLM) procedure of SAS was chosen for omnibus testing rather than analysis 
of variance (ANOVA) because it is less affected by unequal group sizes. A 
simple one-way GLM analysis was chosen because the only data consisted of 
the Delayed Retention Test means of the three groups. The means of the two 
subtest sections were then similarly analyzed by the one-way GLM procedure to 
detect differences in retention of previously tested and novel information. 
Follow-up comparisons were conducted via Least Significant Difference t- test 
(LSD) as implemented in SAS. Alpha was set at the p<.05 level for all tests of 
significance. 


Findings 

The means, standard deviations, and final sizes of the three groups on the 
delayed retention test (including the two subtests and the total scores) are pre- 
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sented in Table 1. The overall difficulty of the test battery and each subtest can 
be estimated by examining the grand means and the range of scores. 

The grand mean of all participants was 16.63 with a range of 3 to 28 on 
the total 30 item test. The grand mean on the 20 item subtest of previously 
tested material was 12.32 with a range of 2 to 20, and the grand mean on the 
10 item subtest of new information was 4.31 with a range of 0 to 9. No student 
scored 100% on the entire test and the grand means were close to 50% on each 
test, so the tests were relatively difficult. The grand means, however, were not 
used in any other analysis of the data. 

Table 1 

Means, Standard Deviations, and Sample Sizes _ 


Subtests 


Treatment 

Total Test 

Previously 

Tested 

New 

Information 


Mean SI) 

Mean SI) 

Mean 

SD 

Group A 
Multiple-Choice 

19.05 4.0014.05 

2.89 

5.00 

1.95 

n =63 





Group B 
Short-Answer 

16.86 4.7212.48 

3.28 

4.38 

2.03 

h=64 





Group C 

No Test Control 

13.85 4.5710.33 

3.14 

3.52 

1.97 

n =60 





Overall 

16.63 4.4312.32 

3.10 

4.31 

1.98 


«=187 


The GLM procedure was used to compare the 3 treatment groups (Group 
A, Multiple-choice Test; Group B, Short-answer Test; and Group C, Control) 
on the means of the total delayed retention test scores. A significant difference 
was found among the total test means: F( 2, 184) = 21.16, jX.000 1, R-Square = 
.19. 

Following this significant finding, the GLM procedure was again employed 
to examine the means of each subtest. Significant differences were found 
among the means on the subtest of previously tested information, F( 2, 184) = 
22.07, /;<.()()() 1, R-Square = .19, and among the means on the subtest of new 
information, F( 2, 184) = 8.64, p<. 0003, R-Square = .08. 


- 38 - 





Journal of Technology^ Education _ Vol. 6 No. 1, Fall 1994 

Followup comparisons were conducted via f-test (LSD) procedures in SAS. 
The results of the LSD comparisons are shown in Table 2. The critical value 
used was f(184) = 1.97, p<. 05. In the total test scores and both subtests 
(previously tested and new information), the means of the two treatment groups 
which were previously tested. Group A (Multiple-choice Test) and Group B 
(Short-answer Test) were both significantly greater than the means of Group C 
(No Test Control). 

Table 2 

Contrasts of Group Means Via LSD Procedures 


Groups and Means 



Group C 
No Test 

Group B 
Short-Answer 

Group A 
Mult.-Choice 

Total Test 

13.85 

16.86 

19.05 

Subtests: 

Previously 

Tested 

10.33 

12.48 

14.05 

New Information 

3.52 

4.38 

5.00 


Note. Means not underlined were significantly lower at the .05 level. 

LSD followup comparisons also showed that Groups A and B were equal in 
their retention knowledge of the new information (10 item subtest of informa¬ 
tion which was not previously tested), but that Group A (Multiple-choice Test) 
outscored Group B (Short-answer Test) significantly on the 20 item subtest of 
previously tested information and on the total test. 

Discussion 

The first of three research questions addressed by this study was: If delayed 
retention learning is the objective of instruction, does initial testing of the 
information aid retention learning? Within the constraints of this study, testing 
of instructional material did promote retention learning. Two types of tests 
were shown to be effective in supporting retention learning. The question could 
be raised whether it was the actual act of taking the test which aided retention 
learning or if the knowledge that a test was forthcoming motivated students to 
study more effectively. This was a central research question of a previous study 
(Haynie, 1990a) in which announcements of the intention to test were evaluated 
and shown not to be effective in promoting retention learning unless they were 
actually followed by tests or reviews. No attempt was made in this study to 
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separate the effectiveness of prior knowledge concerning upcoming tests from 
gains made while studying for and taking the tests. 

The second research question was: Does initial testing by short-answer 
tests aid retention learning as effectively as initial testing with multiple-choice 
tests? The findings presented here provide evidence that multiple-choice tests 
promote retention learning more effectively than do short-answer tests. Both 
Group A and Group B scored significantly higher than the control (no test) 
group on the total test and both subtests. However, multiple-choice tests appear 
to be more effective in promoting retention learning than are short-answer tests 
as shown by the finding of significantly higher scores for Group A on the 
subtest of previously tested information. This may be because the correct 
answer to each item is provided along with the distractors in the multiple- 
choice items, but students had no cues to help them remember the answers, or 
even reconsider the issues, in the short-answer test items. Moving information 
from short term to long term memory is aided by rehearsal and, it appears that, 
multiple-choice test items are a more effective form of rehearsal than short- 
answer test items. 

An alternate conclusion would be that the students who took the multiple- 
choice test performed better simply because the delayed retention test was in the 
same (multiple-choice) form. Further research should be conducted to examine 
this factor. The recommendation given here is to choose the type of test which 
is best suited to the educational objectives and trust that when it is used for 
evaluation, it will also aid in promotion of retention learning. However, if it is 
desirable to maximize the promotion of retention learning, then use of 
multiple-choice items on the test may be preferred over short-answer items. 

This does assume, however, that the multiple-choice items used will be good 
test items which are devoid of the errors in item developement shown in previ¬ 
ous research on test items authored by teachers (Haynie, 1992). 

The final research question was: Will information which is not represented 
on initial testing be learned equally well by students tested via short-answer and 
multiple-choice tests? The delayed retention test used in this experiment 
contained a subtest of ten items interspersed throughout the test which had not 
appeared in any form on the initial tests. If the two types of test were equal in 
effectiveness, then both the subtests of new and of pretested information should 
have found no differences between the groups except for poorer performance by 
the control group. Alternatively, if one type of test were superior in promotion 
of retention learning, then one experimental group should outscore the other 
one on the subtest of previously tested material, but not on the subtest of new 
material. 

Although the tests were short, there was no significant difference in the 
performance of the two previously tested groups on the ten item subtest of new 
information. Though both of these groups outscored the control (no test) group 
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significantly on this subtest of novel information, there was no difference 
between the two experimental groups. So, short-answer and multiple-choice 
tests were both equally effective in promotion of retention of information which 
was studied but which was not actually reflected in the test items. The 
conclusion here is, if short-answer tests are well suited to the type of learning 
objectives being tested from an evaluation viewpoint, then well developed 
short-answer tests should be equally effective in promoting retention learning 
of incidental information as multiple-choice tests. 

Recommendations 

Since testing requires considerable amounts of student and teacher time in 
the schools, it is important to maximize every aspect of the evaluation process. 
The ability of teachers to develop and use tests effectively has been called into 
question recently, however, most research on testing has dealt with standard¬ 
ized tests. The whole process of producing, using, and evaluating teacher-made 
tests is in need of research. 

This study was limited to one educational setting. It used learning materi¬ 
als and tests designed to teach and evaluate a limited number of specified ob¬ 
jectives concerning one body of subject matter. The sample used in this study 
may have been unique for unknown reasons. Therefore, studies similar in de¬ 
sign which use different materials and are conducted with different populations 
will be needed to achieve more definite answers to these research questions. 
However, on the basis of this one study, it is recommended that: (a) when use¬ 
ful for evaluation purposes, classroom testing should continue to be employed 
due to its positive effect on retention learning; (b) both multiple-choice and 
short-answer tests promote retention learning, however, multiple-choice tests 
are more effective in this regard; (c) it appears that teachers who use short an¬ 
swer tests need not be overly concerned that students will only benefit from the 
learning of those specific facts represented on the test to the exclusion of infor¬ 
mation not represented because both short-answer and multiple-choice tests 
were shown to be equal in their ability to promote retention of material which 
was studied but not actually included on the test. So, if the instructor wishes to 
maximize the potential gains in retention made while students take a test, mul¬ 
tiple-choice tests should be used, however, if short-answer tests are more 
appropriate for the evaluation situation present, their use will also benefit stu¬ 
dents' retention, although to a lesser degree. The ability of the individual 
instructor to develop good multiple-choice test items should be considered in 
making this decision. 

Short-answer tests may have advantages of their own which make them 
useful in some situations because they do not force students to choose from a 
predetermined set of responses. Though some of the research examined in the 
review of literature for this study was critical of short-answer tests, the fact that 


- 41 - 



Journal of Technology’ Education 


Vol. 6 No. 1, Fall 1994 


teachers have difficulty authoring effective multiple-choice items may make 
short-answer items a better choice for many situations. This study did not ex¬ 
amine the effect of post test reviews when using short-answer tests. Such re¬ 
views have been shown to be helpful in promoting retention of information 
tested via multiple-choice tests. The effects (on retention) of post test reviews 
following short-answer tests should be addressed in future research. 

Testing, pre test reviewing, post test reviewing, and occasional retesting 
require large amounts of learning time. As technology education moves away 
from the traditional “shop” setting of industrial arts and toward a more concep¬ 
tually based curriculum, the teaching and testing of cognitive information in¬ 
creases in importance. More of the time of students and teachers will be con¬ 
sumed by testing and related activities such as pre and post test reviews. 
Technology teachers should understand how to make this time beneficial for 
learning as well as for evaluation. Technology teacher educators should help 
preservice and inservice teachers learn how to maximize the learning potential 
of time devoted to testing and reviews. The value of tests in promoting reten¬ 
tion learning has been demonstrated here and two research questions about the 
types of tests to use for specific purposes within the context of technology edu¬ 
cation classes have been addressed, however, there remain many more potential 
questions about all sorts of teacher-made tests. The tests used in this study were 
carefully developed to resemble and perform similarly to teacher-made tests in 
most regards, however, there are still research questions which must be an¬ 
swered only on the basis of tests actually produced by teachers and for use in 
their natural settings. The process of pre-test reviewing, testing, and post test 
reviewing is too time consuming to be ignored. Continued research must be 
conducted to determine the best ways to test and review so as to meet the needs 
of evaluation and to maximize retention of important learning in technology 
education and in other disciplines. 
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